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USING VIDEO IMAGE ANALYSIS TO AUTOMATICALLY TRANSMIT 
GESTURES OVER A NETWORK IN A CHAT 
OR INSTANT MESSAGING SESSION 

5 

BACKGROUND OF THE INVENTION 

Field of the Invention 

This invention relates to transmitting gestures in a 
chat session by participants communicating across a network 
of computers, and more specifically, to the use of a video 
camera to capture an actual physical gesture made by a 
participant, and automatically transmitting a textual or 
graphical representation of the captured gesture to the 
other participants in the chat session. 

H= 

2 Description of the Related Art 

w 

si™* 

w As computational devices continue to proliferate 

throughout the world, there also continues to be an increase 
20 in the use of networks connecting these devices . 

Computational devices include large mainframe computers, 
workstations, personal computers, laptops and other portable 
devices including wireless telephones, personal digital 
assistants, automobile -based computers, etc. Such portable 
25 computational devices are also referred to as "pervasive" 
devices. The term "computer" or "computational device", as 
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used herein, may refer to any of such device which contains 
a processor and some type of memory. 

The computational devices may be connected in any type 
of network including the Internet, an intranet, a local area 
5 network (LAN) or a wide area network (WAN) . The networks 
connecting computational devices may be "wired" networks, 
formed using lines such as copper wire or fiber optic cable, 
wireless networks employing earth and/or satellite -based 
wireless transmission links, or combinations of wired and 

10 wireless network portions. Many such networks may be 
organized using a client/server architecture, in which 
"server" computational devices manage resources, such as 
files, peripheral devices, or processing power, which may be 
requested by "client" computational devices. "Proxy servers" 

15 can act on behalf of other machines, such as either clients 
or servers. 

A widely used network is the Internet. The Internet, 
initially referred to as a collection of "interconnected 
networks", is a set of computer networks, possibly 

20 dissimilar, joined together by means of gateways that handle 
data transfer and the conversion of messages from the 
sending network to the protocols used by the receiving 
network. When capitalized, the term "Internet" refers to 
the collection of networks and gateways that use the TCP/IP 

25 suite or protocols. 

Currently, the most commonly employed method of 
transferring data over the Internet is to employ the World 
Wide Web environment, referred to herein as "the Web". 
Other Internet resources exist for transferring information, 

30 such as File Transfer Protocol (FTP) and Gopher, but have 
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not achieved the popularity of the Web. In the Web 
environment, servers and clients effect data transfer using 
the Hypertext Transfer Protocol (HTTP) , a known protocol for 
handling the transfer of various data files (e.g., text, 
5 still graphic images, audio, motion video, etc.). 

Electronic mail, or e-mail, is a frequently used 
feature of the Internet which allows the sending of messages 
to anyone connected to the Internet or connected to a 
computer network that has a connection to the Internet, such 

10 as an online service. An Internet e-mail message is sent, 
using the Internet's TCP/IP protocol, as a stream of 
packets, where each packet contains the destination address 
of a mail server used by the intended recipient. When all 
of the packets reach the destination address, the mail 

15 server recombines them into an e-mail message that a 
recipient can read when the recipient accesses the 
recipient's mailbox at the mail server. 

A more immediate way to communicate with others over 
the Internet is to participate in a "live" chat session. As 

20 a participant enters text via a keyboard, other participants 
to the chat session can see the text being entered 
immediately. A protocol called Internet Relay Chat (IRC) 
can be used between an IRC client communicating with an IRC 
server on the Internet to effectuate a chat session. A 

25 participant using a client logs onto a server and selects a 
channel on which the participant wants to chat. As a 
participant types a message on a keyboard, the message, as 
it is being entered, is sent to the server. The server is 
part of a global IRC server network, and sends the message 

30 to the other servers which send the message to all of the 



AUS920000683U^^ - 4 - PATENT 



others participating on the same channel. Other chat 
sessions can be effectuated without using the IRC protocol. 
For example, proprietary chat software can be used by 
individual Web sites to enable visitors to the site to 
5 communicate with each other in a live chat session. 

Instant messaging is another way to communicate with 
other participants in "real time". Instant messaging is 
different from the live chat sessions discussed above in 
that instant messaging enables a participant to communicate 

10 privately with another person. A user can create special 
lists of "buddies". When a "buddy" comes on line, the other 
buddies are notified. They can then participate in 
communicating with each other. 

It should be noted that although these "real time" 

15 forms of communicating are referred to as "chat" sessions; 
the communication is in the form of transferring inputted 
text, such as via a keyboard, and does not typically include 
"auditory", i.e., voice, communication. 

It is possible, however, to communicate in an auditory 

20 fashion over the Internet network, also. In this way, the 
sound of the participants' voices are broken down into 
packets which are then delivered using the Internet's TCP/IP 
protocols. Auditory communication over the Internet can be 
carried out in many ways. In one way, referred to as 

25 Internet telephony, the communication is made in a manner 
similar to a telephone, but the call is routed over the 
Internet instead of through the phone service. In another 
way, the communication is carried out through computers, 
connected to the Internet, having special hardware (e.g., 

30 microphones, speakers, etc.) and software. In this way, not 
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only may audio communication be made, but text or graphics 
may also be sent between the participants using the 
computer's display monitor and other attached input and 
output devices. In addition, systems are also known in 
5 prior art to utilize a camera as a computer input device to 
communicate video images and audio over the Internet. 

Regardless of these other types of video or audio 
communication means, the most prevalent communications means 
at the present time utilizes typed text such as is used in 

10 chat sessions or instant messaging as discussed above. The 
problem with typed text, however, is that all that is 
communicated are the words themselves . The words themselves 
do not necessarily communicate all of the information that 
can be conveyed in a real live conversation which the live 

15 Internet chat session is trying to model. Typically, in a 
face to face communication, a person listens to the tone of 
the communicated words, and observes any associated body 
language, in order to interpret the meaning of the 
communication and to gather all of the communicated message. 

20 This is absent in chat sessions and instant messaging. 

To compensate for this, emoticons are frequently used. 
Emoticons have emerged in connection with live chat sessions 
and instant messaging in order to enable a participant to 
further communicate the participant's tone, emotion, or 

25 feelings in connection with any typed words that are 

communicated. For example, :) is an emoticon which conveys 
that the participant sending the communication is smiling or 
happy. This can be used to inflect a sarcastic or joking 
statement to communicated words. Likewise, the emoticon :( 

30 conveys an "unhappy" emotion such as sadness or 
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disappointment or dislike in something that was 
communicated. The emoticon : -D may be used to indicate that 
the person is laughing; and the emoticon ;-) may be used to 
indicate that what the person said was said with a wink. A 
5 wide range of other emoticons are also known and used. 

Avatars are also used in chat room software. An avatar 
is a graphical animation that represents a participant. An 
avatar comes and goes from the display screen of the 
participants as the participant that it represents comes and 

10 goes from the chat session. 

As shown above, emoticons are used frequently in live 
chat sessions on the Internet to convey gestures, such as a 
smile, a wink, a frown, etc. Unfortunately, a participant 
has to first contemplate the type of gesture that the 

15 participant is making (e.g., the participant may have to 
momentarily stop to think "Am I smiling?", "Is my head 
nodding in agreement?", etc.); and then type in a 
combination of characters to create an emoticon to reflect 
that gesture. Likewise, for avatars, specific scripts or 

20 commands have to be selected by a participant in order to 
control the presentation or animation of the avatar to the 
other participants. It would therefore be desirable if 
gestures could be conveyed in a live chat session or instant 
messaging communication in a more automated fashion in order 

25 to immediately convey the actual gestures being made by a 
participant. Presently, there has not been a way to 
automatically convert an actual physical gesture of a 
participant in a chat -room to a form that can command the 
chat room software. 
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SUMMARY OF THE INVENTION 
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It is therefore an object of the invention to 
automatically generate a representation of an actual 
physical gesture made by a participant in a "real time" 
communication over the network. 

The system, method and program of the invention 
automatically generates input into chat room software that 
represents an actual physical gesture made by a participant 
in a real time communication over a network, such as a 
"live" chat session or an instant messaging communication. 
The system comprises automatic gesture software in 
combination with image processing software that can analyze 
captured video frames. A video camera, utilized in 
connection with the participants' computer system, captures 
the real time gestures made by the participant, such as a 
wave, a shoulder shrug, a nodding of the head, and inputs 
the captured video images into the computer system of the 
participant. The image processing software analyzes the 
captured video images, received as input, of a participant. 
When a gesture is depicted, the computer system accesses a 
database to find a corresponding graphic or text 
translation, such as an emoticon or a text description or 
animation of an avatar, and inserts the translation into the 
participants' dialogue in the live chat session in 
accordance with the command interface to the chat room 
software. For example, in this way, a representation of the 
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gesture is automatically generated and can be inserted 
within a communication from one participant to each of the 
other participants in an on-line chat session within the 
network of computers. 
5 It should be noted that the present invention may also 

be implemented in audio communications made over the 
computer network wherein the translated gesture is displayed 
on a display device in conjunction with the audio 
communication. Although it is foreseeable that technology 

10 will support full audio and video image transmissions for 
chat rooms, it is anticipated that at least some chat rooms 
will continue to be carried out without such technology in 
order to protect the anonymity of the participants. In this 
regard, the automatic transmission of gestures of the 

15 present invention will continue to be advantageous. 

In addition to the advantage of being able to 
automatically transmit a translation of a chat room 
participant's actual physical gestures, another advantage of 
the present invention also exists. With the present 

20 invention, cultural dependent gestures are interpreted at 
the context of the user thereby minimizing any chance of 
misunderstanding. For example, the victory "V" sign, which 
is a vulgarity in Australia and Latin countries, would be 
interpreted as "victory" and transmitted as such with the 

25 appropriate action for "victory" . 

BRIEF DESCRIPTION OF THE DRAWINGS 



For a more complete understanding of the present 
30 invention and the advantages thereof, reference should be 
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made to the following Detailed Description taken in 
connection with the accompanying drawings in which: 

Fig. 1A illustrates the hardware components of a 
computer system for a participant using chat room software 
and the automatic gesture software of a preferred embodiment 
of the present invention; 

Fig. IB illustrates the software components of a 
network of computers including at least two computer systems 
which enable a participant at each computer system to 
communicate with other participants over the network by 
utilizing the automatic gesture software of a preferred 
embodiment of the present invention; 

Fig. 2 illustrates logic for automatically transmitting 
gestures over a network; and 

Fig. 3 illustrates an exemplary table of gestures with 
associated actions; 

Fig. 4 illustrates a block diagram of a data processing 
system in which the present invention may be implemented; 
and 

Fig. 5 is a block diagram illustrating a software 
organization within a data processing system in accordance 
with a preferred embodiment of the present invention. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

In the following description, reference is made to the 
accompanying drawings which form a part hereof, and which 
illustrate several embodiments of the present invention. It 
is understood that other embodiments may be utilized and 
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structural and operational changes may be made without 
departing from the scope of the present invention. 

With reference now to the figures, a preferred 
embodiment of the invention is described. Fig. 1A 
5 illustrates the hardware components of a computer system for 
a participant using chat room software and the automatic 
gesture software of a preferred embodiment of the present 
invention. The computer system comprises a processing 
system unit 121 with connections to a video camera 115, a 
10 display monitor 114, keyboard 112, and mouse 106. Video 
camera 115 may be a digital video input device or a 
n conventional analog video camera connected to a video 

J3 capture device, which are known in the art. Also included 

^ are storage devices 108, which may include floppy drives and 

yj 15 other types of permanent or removable storage devices. 

hi 

~ Optional input/output devices may include speakers 113L, 

=3 113R, and microphone 116. Speakers become necessary in 

J\ those embodiments where an action for a gesture produces 

Q sound. Alternatively, speakers 113L and 113R may be 

^ 20 replaced with headphones or other audio output devices. 
P Likewise, a microphone may become necessary if a participant 

^ configures a gesture to have an associated sound as produced 

at the participant's system. The associated sound could 
also be directly transmitted without the microphone. As 
25 such, the microphone may be replaced by with other audio 
input devices such as a digital music keyboard or 
synthesizer. It should be noted, also, that the speakers 
and microphone become necessary in those embodiments wherein 
the communication between participants over the network 
30 includes audio communication. Other input/output devices 
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may also be attached to system 101 such as modems, printers, 
etc. 

Fig. IB illustrates the software components of a 
network of computers including at least two computer systems 
5 which enable a participant at each computer system to 
communicate with other participants over the network by 
utilizing the automatic gesture software of a preferred 
embodiment of the present invention. Data processing system 
network 100 includes at least two computer systems (e.g., 

10 client computer systems) 110, 120 which enable a participant 
at each computer system to communicate with each other and 
other participants over the network 108. Data processing 
network 100 also includes one or more servers 130, 140 which 
are accessible as part of the Internet 108 or other network. 

15 Computer systems 110, 120 are enabled to access servers 130, 
140. At least one of the servers 130, hosts a Web site 131 
that utilizes chat software. Examples of chat software 
include a) "ichat" which is client and server software for 
accessing and running chat sites (IRC, MUDs, telenet) with 

20 extensions including World Wide Web (WWW) integration (see 
ichat.com); b) "chatblazer" which is chat software for Web 
sites (see chatblazer.com); and c) "volanochat" which is 
client and server Java software for providing chat at Web 
sites or running a world wide chat network from Volano (see 

25 volano.com) . Other software is also currently available to 
provide a chat session over a network, e.g., the Internet. 

It should also be noted that content on the servers 
130, 140 may be accessed by clients 110, 120 using any of a 
variety of messaging system protocols including Hypertext 

30 Transfer Protocol (HTTP) , File Transfer Protocol (FTP) , 
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Network News Transfer Protocol (NNTP) , Internet Mail Access 
Protocol (IMAP) , Internet Relay Chat (IRC) , or Post Office 
Protocol (POP), etc. 

In accordance with the present invention, clients 110, 
120 within data processing system network 100 each includes 
a messaging system client application 119 (e.g., a browser), 
capable of transmitting and receiving messages containing 
commands to and from a messaging system server application 
139, 149 within servers 130, 140, respectively. Commands 
may be issued by client application 110 to server 
application 139 in order to cause some operation to be 
performed by server 130. Client 110 may execute one or more 
user applications 118, either within browser application 119 
or apart from browser application 119 , which are capable of 
sending and retrieving data over the Internet 108 to and 
from servers 130 or 140. Such user application (s) 118 
includes client side chat software or other client side 
software that enables the participant at client 110 to 
communicate with other participants over the network, e.g., 
via a server 130. Either in combination with or separate 
from software 118 is client side software 117 which enables 
video images from video camera 115 to be captured and 
analyzed for any one of a plurality of gestures made by 
participant. Software 117 furthermore determines a chat 
room command, e.g., a graphic or text representation, of the 
gesture and sends it via the communication sent from the 
participant using chat software 118. 

Likewise, client 120 has similar hardware components as 
for client 110 (such as client 101 in Fig. 1A) , and similar 
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software components including a browser application 119, 
chat room software 118, and automatic gesture software 117. 

Automatic gesture software 117 comprises image 
processing and computer vision software to analyze images 
5 from video camera 115. One source of image processing and 
computer vision software is Amerinex Applied Imaging which 
specializes in computer vision products such as Aphelion and 
KBVision (see their Web site on the World Wide Web (WWW) 
using the HTTP protocol at aai.com) . The imaging software 
10 analyzes various features of a participant from captured 
video frames generated by video camera 115. For example, 
the imaging software may discern any one or more of the 
%P following features including, but not limited to, the head, 

^ eyes, mouth (lips), shoulders, arms, and hands. For 

bj 15 example, the imaging software can detect whether the head 

t _ ! 

^ nods up and down in successive frames, or if there is a 

<J3 prolonged "wink" in one eye, or if the mouth makes a smile 

or frown, or if the shoulders "shrug", or if an arm or hand 

ssss 

O moves across the captured video frames such as in depicting 

20 a wave or other gesture, 
o Automatic gesture software 117 also comprises a 

u scripting language for describing the gestures and relating 

them to commands such as chat software commands. The 
scripting language has recognition events and corresponding 
25 actions. In a preferred embodiment the syntax takes the 
following form: 



command (state, action (parameters for the action) ) 
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For example, if the imaging software detects a left to 
right wave, then the scripting language will define this as 
a command having an action of announcing "hello, folks" 
where "hello folks" is the parameter for the announce 
action. This parameter is used if the state is equal to 1, 
meaning that this is the first time that the imaging 
software depicted a wave from the left to the right. As 
such, for this example, the automatic gesture software would 
generate the following: 

onLef tToRightWave (1, announce ( "hello, folks" ) ) 

As such, when the participant waves a hand left to 
right for the first time (state 1 being the initial state) , 
the automatic gesture software converts this into chat room 
parlance - "hello, folks". Depnding upon the embodiment, 
and/or depending upon selection options offered to the 
participant by the automatic gesture software, the physical 
gesture can be converted either into an auditory 
communication (i.e., by announcing "hello, folks," or by a 
textual communication such as by inserting text into the 
written conversation stating that participant #X states 
"hello, folks," or by sending a graphic image of an arm 
waving to be displayed on the participants' graphic display 
monitor. In addition, some chat software shows a graphical 
representation or animated avatar of each participant 
participating in the chat room. As such, the command from 
the physical gesture could be to wave the arm in the 
animation of the participant displayed on the monitor. 

As a further example, if the imaging software 
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subsequently depicts another waving of the hand from left to 
right, the automatic gesture software would take the action 
to announce or insert text or graphic into the communication 
to indicate in chat room parlance - "Yeah, I'm still here". 
5 The recognition event and corresponding action may be 
depicted by the scripting language as 

onLef tToRightWave (*, announce ("Yeah, I'm still here.")) 
where * is any subsequent state after the initial (1) state. 

10 

It should be noted that the above described scripting 
f=i language is for illustrative purposes, and that other 

*0 embodiments may utilize a scripting language constructed in 

p another way and format while still being within the breadth 

lu 15 and scope of the present invention. 

5 Fig. 2 illustrates the logic of a preferred embodiment 

CI of the invention. When the gesture software is invoked, 

L 201, then all states are set to zero (0), 202. Upon each 

B 

O invocation of the automatic gesture software, the scripting 

Tt 20 language is initialized in order to set up the gesture 
Q recognition state. For example, all states are initialized 

^ to zero in order to then determine when a gesture has a 

first occurrence. This is important since a different 
action or translation of the gesture may be made depending 
25 upon whether it is a first or a subsequent occurrence of the 
gesture. 

The gesture software can be automatically invoked 
whenever the participant begins a chat session, or it can be 
invoked upon selection by the participant. In addition, the 
30 gesture software can be disabled (and then enabled) at any 
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time during a chat session which will not affect the 
initialization states. The gesture software can be disabled 
because there may be times during a chat session that the 
participant does not want the gestures to be automatically 
5 transmitted. For example, another person may enter into the 
same physical room as the participant which the participant 
begins communicating with while breaking away momentarily 
from the on-line chat session. 



10 initialization states are set, it is then determined whether 
or not to invoke the configuration process, 203. The 
m configuration process may take place upon initial 

C s installation of the software product on the participant's 

p=J computer or at any time the participant invokes the 

W 15 software. Of course, once configured, the participant may 
skip the configuration process for any given invocation if 
-43 the gestures and correlated actions are set as desired by 

jj\ the participant. Furthermore, if the automatic gesture 

O software is already preprogrammed with all of the possible 

20 gestures and correlated actions; no configuration is 
Q necessary, or even provided, in some embodiments. 

^ Otherwise, the participant is prompted as to whether or 

not configuration is desired. The configuration process, in 
some embodiments, may consist of performing a gesture for 
25 the video camera and then designating the interpretation of 
the gesture including an interpretation based upon when the 
gesture occurs. Depending upon the embodiment, the 
automatic gesture software may provide a set of available 
gestures, e.g., wave hand, smile, frown, wink, shrug, nod, 
30 for which the user may designate the action (announce, 



After the gesture software is invoked, and the 



AUS920000683U^^ - 17 - PATENT 



insert text, insert graphic,) and the parameter of the 
action (e.g., the content or translation of the gesture). 

Other embodiments may be virtually limitless in the 
gestures that the participant may make and how they may be 
5 defined or translated. For example, in some embodiments, 
the GUI of the automatic gesture software for the 
configuration task may display to the user a still video 
frame of the user captured at that time from the video 
camera. The user could then select a feature, such as the 

10 mouth, hand, eye, shoulders, head, etc. The GUI would then 
ask the user to make a gesture using the selected feature. 
The automatic gesture software then analyzes the changes in 
the selected feature throughout a sequence of captured video 
frames of the user making the gesture. The GUI then prompts 

15 the user to designate a corresponding action such as insert 
laugh, announce "hello, folks", announce "just kidding", 
announce "yes, I agree", etc., for the gesture, 204. As 
such, the automatic gesture software can be configured to 
correlate a gesture with an action. The user can then 

20 further specify the corresponding action in relation to the 
state of the gesture, i.e., whether it is a first occurrence 
or a subsequent occurrence. Note, however, that there can 
be more states than just "first" and "other" occurrence. 
For example, additional states may include "any, first, 

25 last, non-first, non-last, null, ifactive, if inactive, " all 
of which are dependent on actual states that a particular 
application may support. 

Then the configuration process stores, in a table, 300, 
as shown in Fig. 3, the gesture with the corresponding 

30 action, 205, (Fig. 2). As such, at the end of the 
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configuration process, an associative table, which maps the 
gesturing events to the appropriate scripting command, is 
populated. That is, since the configuration process has 
defined each gesture with a corresponding action, a database 
or table 300 (Fig. 3) becomes populated with the gesturing 
events 301, state of gesture 302, and corresponding action 
303 and parameter of the action 304, i.e., the content to be 
transmitted for the gesturing event. 

Referring back to Fig. 2, the automatic gesture 
software then asks the participant if there are any more 
gestures that the user wishes to define, 206. If there are, 
the GUI prompts the participant as stated above and 
reiterates steps 204-206. Otherwise, the automatic gesture 
software ends the configuration process, and the camera is 
deactivated. 

As mentioned above, depending upon the embodiment, the 
automatic gesture software may either provide a 
predetermined set of gestures already predefined so that the 
configuration process is eliminated; or the automatic 
gesture software may allow the predetermined set of gestures 
to be tailored as to the corresponding action desired by the 
user; or the automatic gesture software may allow virtually 
any gesture to be made and defined in any way so desired by 
the user. For example, the software may allow a user to 
define a new gesture using the scripting language. Upon 
compiling the definition, the software is instructed, (e.g., 
by the word "watchme" as used below) , to invoke a training 
session to associate the user's actual gesture with this new 
definition. Thereafter, it can be used to augment the 
predefined vocabulary of gestures, such as: 
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def ineGesture ( "deep f rown" , watchme) 

onGesture ["deep frown"] (*, announce ( "I don't know about 
5 that!!)) 

Some embodiments may also add a censoring feature in 
order to set a certain standard as to the content of a user 
defined action. The censoring feature may further have a 
10 selectable feature in order to set any content to a level of 
appropriateness based upon age of a child, or the 
environment, such as a religious environment or a business 
environment. 

After the gesture recognition state is initialized 202, 

15 and the configuration process, if any, is completed 203-206; 
the video camera is activated and starts taking video images 
of the participant 207. As the recording of the video 
begins, the image processing software of the automatic 
gesture software analyzes captured video frames of the 

20 participant looking for known gesturing events. In essence, 
the image processing software looks for similar changes in 
the video frames of the selected features as defined during 
configuration and set up. For example, a user may want to 
define a wink as a gesture. As such, the user may designate 

25 the eye as one of the features. The image processing 
software would then analyze a sequence of captured video 
frames to determine when an eyelid is in the closed position 
in a range of 1-3 seconds to differentiate between an 
intended wink as a gesture and the normal blinking of an 

30 eyelid which remains shut for a relatively smaller amount of 
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time than 1 second. To determine this, the image processing 
software would analyze various captured video frames 
covering a second or more in time to determine if an eyelid 
remained closed during such an interval. If so, the image 
5 processing software would then determine from the video 
analysis that a wink gesturing event occurred. 

Upon determination that a gesturing event occurred 208, 
the automatic gesture software would determine the state of 
the gesturing event, i.e., whether it is the first 
10 occurrence of the event since initialization or a subsequent 
event 209. The automatic gesture software would then access 

^ an associative mapping, e.g., a database of gestures, and 

find the corresponding action and parameter of the action, 
i.e., the content to be transmitted, for the gesture and the 

W 15 state of the gesture 210. For example, if the gesturing 

hi 

p event is a left to right hand wave, then depending upon the 

*0 current state, the gesturing event is mapped to a command 

L inputted to the chat room software. As such, for a gesture 

O associated with the onLef tToRightWave state, the appropriate 

£1 20 action part is selected and executed based on the sequence 
O or other open states. For example, "announce" action may 

^ equate to the execution of "send this string across the 

network to other participants." 

A diagrammatic representation of these steps is: 

25 

gesture -> command (state) -> action with chat room 
software 



where "action with chat room software" is typically an 
30 alternative interface to keyboard- actuated or mouse -actuated 
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functionality in the software. The default interface when 
the user types some text and presses the Enter key is "send" 
or "announce." The software may provide buttons or 
pull -down selection of graphics to be transmitted to 
5 indicate emotion, so these functions may effectively 
represent gesture vocabulary for "smile" or "laugh". 

In this manner, instead of merely typing the cryptic 
phrase "ROTFL" , the gesture-aware software may recognize the 
user's laughing behavior and send the more complete text, 
10 "I'm rolling on the floor, laughing!" 

The gesture software continues to analyze captured 
m video frames to determine a gesturing event 208-210 until a 

J3 signal event to end is received 211, which ends the logic 

~4 212. Depending upon the type of signal received, the 

15 process will either begin again at step 201 upon invocation, 
or at step 207 if the signal was to merely stop the 
activation of the video camera momentarily as discussed 
previously. 

With reference now to Fig. 4, a block diagram of a data 
20 processing system is shown in which the present invention 
may be implemented. Data processing system 400 is an 
example of a computer, such as computer 101 in Fig. 1A, in 
which code or instructions implementing the process of the 
present invention may be located. Data processing system 
25 400 employs a peripheral component interconnect (PCI) local 
bus architecture. Although the depicted example employs a 
PCI bus, other bus architecture such as Accelerated Graphics 
Port (AGP) and Industry Standard Architecture (ISA) may be 
used. Processor 402 and main memory 404 are connected to 
30 PCI local bus 406 through PCI bridge 408. PCI bridge 408 
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also may include an integrated memory controller and cache 
memory for processor 402. Additional connections to PCI 
local bus 406 may be made through direct component 
interconnection or through add-in boards. In the depicted 
5 example, local area network (LAN) adapter 410, small 
computer system interface SCSI host bus adapter 412, and 
expansion bus interface 414 are connected to PCI local bus 
406 by direct component connection. In contrast, audio 
adapter 416, graphics adapter 418, and audio/video adapter 
10 419 are connected to PCI local bus 406 by add- in boards 

inserted into expansion slots. Expansion bus interface 414 
provides a connection for a keyboard and mouse adapter 420, 
: J3 which may be a serial, PS/2, USB or other known adapter, 

n modem 422, and additional memory 424. SCSI host bus adapter 

y 15 412 provides a connection for hard disk drive 426, tape 
~ drive 428, and CD-ROM drive 430. Typical PCI local bus 

a implementations will support three or four PCI expansion 

jj\ slots or add- in connectors. 

O An operating system runs on processor 402 and is used 

r: 20 to coordinate and provide control of various components 
Q within data processing system 400 in Fig. 4. The operating 

^ system may be a commercially available operating system such 

as Windows 98 or Windows 2000, which are available from 
Microsoft Corporation. Instructions for the operating 
25 system and applications or programs are located on storage 
devices, such as hard disk drive 426, and may be loaded into 
main memory 404 for execution by processor 402. 

Those of ordinary skill in the art will appreciate that 
the hardware in Fig. 4 may vary depending on the 
30 implementation. Other internal hardware or peripheral 
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devices, such as flash ROM (or equivalent nonvolatile 
memory) or optical disk drives and the like, may be used in 
addition to or in place of the hardware depicted in Fig. 4. 
Also, the processes of the present invention may be applied 
5 to a multiprocessor data processing system. 

For example, data processing system 400, if optionally 
configured as a network computer, may not include SCSI host 
bus adapter 412, hard disk drive 426, tape drive 428, and 
CD-ROM 430, as noted by dotted line 432 in Figure 4 denoting 
10 optional inclusion. In that case, the computer, to be 

properly called a client computer, must include some type of 
network communication interface, such as LAN adapter 410, 
modem 422, or the like. As another example, data processing 

Si 

system 400 may be a stand-alone system configured to be 
UJ 15 bootable without relying on some type of network 
J communication interface, whether or not data processing 

yp system 400 comprises some type of network communication 

y ; interface. As a further example, data processing system 400 

□ may be a Personal Digital Assistant (PDA) device which is 

£7 20 configured with ROM and/or flash ROM in order to provide 
O nonvolatile memory for storing operating system files and/or 

*~ user -generated data. 

The depicted example in Fig. 4 and above -described 

examples are not meant to imply architectural limitations. 
25 For example, data processing system 400 also may be a 

notebook computer or hand held computer or a telephony 

device. 

The processes of the present invention are performed by 
processor 402 using computer implemented instructions, which 
30 may be located in a memory such as, for example, main memory 
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404, memory 424, or in one or more peripheral devices 
426-430, 

With reference now to Fig. 5, a block diagram is shown 
illustrating the software organization within data 
5 processing system 400 in Fig, 4 in accordance with a 

preferred embodiment of the present invention. Operating 
system 502 communicates with automatic gesture software 500. 
The operating system communicates with hardware 520 directly 
through input /output (I/O) manager 510. I/O manager 510 
10 includes device drivers 512 and network drivers 514. Device 
drivers 512 may include a software driver for a printer or 
other device, such as a display, fax, modem, sound card, 
y3 etc. The operating system receives input from the user 

A through hardware 520. Automatic gesture software 500 sends 

W 15 information to and receives information from a network, such 
5 as the Internet, by communicating with network drivers 514 

M3 through I/O manager 510. The automatic gesture software 500 

St 

L, may be located on storage devices, such as hard disk drive 

O 426, and may be loaded into main memory 404 for execution by 

i2 20 processor 402, as shown in Fig. 4. 

Q In this embodiment, automatic gesture software 500 

u includes a graphical user interface (GUI) 510, which allows 

the user to interface with the software 500. This interface 
provides for selection of various functions through menus 
25 and allows for manipulation of elements displayed within the 
user interface by use of a mouse or other input device. For 
example, a menu may allow a user to perform various 
functions including configuring and correlating gestures 
with actions, and initiating or terminating the video 
30 processing. 
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Automatic gesture software 500 also includes image 
processing software 531, as described previously herein, 
which receives input from hardware device 520, i.e., a video 
camera, via the I/O manager 510 and operating system 502. 
5 When automatic gesture software determines that image 
processing software 531 has identified a gesturing event, 
the automatic gesture software converts the gesture 532 to 
an API of chat room software 540. 

The exemplary embodiments shown in the figures are 
10 provided solely for the purposes of explaining the preferred 
embodiments of the invention; and those skilled in the art 
Pi will recognize that numerous variations are possible, both 

in form and function. 
q The preferred embodiments may be implemented as a 

^ 15 method, system, or article of manufacture using standard 
3 programming and/or engineering techniques to produce 

software, firmware, hardware, or any combination thereof. 
The term "article of manufacture" (or alternatively, 
"computer program product") as used herein is intended to 
20 encompass data, instructions, program code, and/or one or 
more computer programs, and/or data files accessible from 
one or more computer usable devices, carriers, or media. 
Examples of computer usable mediums include, but are not 
limited to: nonvolatile, hard -coded type mediums such as 
25 CD-ROMs, DVDs, read only memories (ROMs) or erasable, 
electrically programmable read only memories (EEPROMs) , 
recordable type mediums such as floppy disks, hard disk 
drives and CD-RW and DVD-RW disks, and transmission type 
mediums such as digital and analog communication links, or 
30 any signal bearing media. As such, the functionality of the 



..a 



hi 



n 



AUS920000683U^^ - 26 - PATENT 



above described embodiments of the invention can be 
implemented in hardware in a computer system and/or in 
software executable in a processor, namely, as a set of 
instructions (program code) in a code module resident in the 
5 random access memory of the computer. Until required by the 
computer, the set of instructions may be stored in another 
computer memory, for example, in a hard disk drive, or in a 
removable memory such as an optical disk (for use in a CD 
ROM) or a floppy disk (for eventual use in a floppy disk 

10 drive) , or downloaded via the Internet or other computer 
network, as discussed above. The present invention applies 
equally regardless of the particular type of signal -bearing 
media utilized. 

The foregoing description of the preferred embodiments 

15 of the invention has been presented for the purposes of 
illustration and description. It is not intended to be 
exhaustive or to limit the invention to the precise form 
disclosed. Many modification and variations are possible in 
light of the above teaching. For example, although 

20 preferred embodiments of the invention have been described 
in terms of the Internet, other network environments 
including but not limited to wide area networks, intranets, 
and dial up connectivity systems using any network protocol 
that provides basic data transfer mechanisms may be used. 

25 Also, although the preferred embodiment has been described 
with reference to chat room software, the preferred 
embodiment may also be used in conjunction with other 
software for enabling other types of communication such as 
instant messaging, telephony communication, other audio 

30 communication, conferencing, etc. 
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It is intended that the scope of the invention be 
limited not by this detailed description, but rather by the 
claims appended hereto. The above specification, examples 
and data provide a complete description of the manufacture 
and use of the system, method, and article of manufacture, 
i.e., computer program product, of the invention. Since 
many embodiments of the invention can be made without 
departing from the spirit and scope of the invention, the 
invention resides in the claims hereinafter appended. 

Having thus described the invention, what we claim as 
new and desire to secure by Letters Patent is set forth in 
the following claims. 



